Reducing Bias of Allele Frequency Estimates by Modeling SNP Genotype Data with Informative Missingness
نویسندگان
چکیده
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.
منابع مشابه
Evaluation of ten SNP Markers for Human Identification and Paternity Analysis in Persian Population
Background: DNA markers are inevitable tools of human identification in forensic science. Single Nucleotide Polymorphisms (SNPs) are one category of these markers which is concerned to use especially in the case of degraded DNA because of their short amplicons. Objectives: Detection of highly informative SNPs by the criteria is the essential step to devel...
متن کاملPolymorphism in the interleukin-10 promoter affects both provirus load and the risk of human t lymphotropic virus type I (HTLV-I) associated myelopathy/tropical spastic paraparesis
To investigate candidate genes that influence the risk of HTLV-I associated myelopathy/tropical spastic paraparesis (HAM/TSP), we analyzed 6 single nucleotide polymorphisms (SNP) in the interleukin-10 (IL-10) promoter region. METHODS: 280 cases of HAM/TSP patients and 255 HTLV-I seropositive asymptomatic carriers (HCs) from Kagoshima, Japan were studied. All subjects gave written informed conse...
متن کاملPolymorphism in the interleukin-10 promoter affects both provirus load and the risk of human t lymphotropic virus type I (HTLV-I) associated myelopathy/tropical spastic paraparesis
To investigate candidate genes that influence the risk of HTLV-I associated myelopathy/tropical spastic paraparesis (HAM/TSP), we analyzed 6 single nucleotide polymorphisms (SNP) in the interleukin-10 (IL-10) promoter region. METHODS: 280 cases of HAM/TSP patients and 255 HTLV-I seropositive asymptomatic carriers (HCs) from Kagoshima, Japan were studied. All subjects gave written informed conse...
متن کاملMapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to...
متن کاملAllelic and Genotypic Distribution in Single Nucleotide Polymorphism (SNP) G.676A > G of Melanocortin-1 Receptor (MC1R) Gene in Indonesian Goat Breeds
The melanocortin-1 receptor (MC1R) gene has been investigated by many studies regarding the pigmentation variation in various species. In order to determine its allelic and genotypic distribution, we sequenced the goat MC1R gene from 78 individuals in ten populations (Gembrong, Senduro, Ettawa Grade, Boerawa, Boerka, Kosta, Samosir, Muara, Boer and Kacang). Direct sequencing m...
متن کامل